High Performance Computing in Material Sciences Higher Level Blas in Symmetric Eigensolvers High Performance Computing in Material Sciences Higher Level Blas in Symmetric Eigensolvers

نویسندگان

  • Wilfried N. Gansterer
  • Dieter F. Kvasnicka
  • Christoph W. Ueberhuber
چکیده

In this report a way to apply high level Blas to the tridiagonalization process of a symmetric matrix A is investigated. Tridiagonalization is a very important and work-intensive preprocessing step in eigenvalue computations. It also arises as a very central part of the material sciences code Wien 97 (Blaha et al. [12]). After illustrating the drawbacks and limitations of the tridiagonalization implemented in Lapack (Anderson et al. [1]), the dependency structure in the tridiagonalization process is analyzed. It is shown that data references cannot be localized for tridiagonalization in one sweep and that Level 2 operations cannot be avoided in a one-sweep tridiagonalization algorithm. Thus a di erent method is considered, by which the inherent potential for using higher level Blas can be exploited. By performing tridiagonalization in two successive sweeps, it is possible to utilize Level 3 Blas in the rst sweep, which transforms A into a band matrix B. In a second sweep, B is transformed to a tridiagonal matrix T . The second sweep represents only a very small portion of the overall work, and for this reason the new method yields very good performance. Investigation of the backtransformation process for the corresponding eigenvectors reveals that it is preferable to calculate the eigenvectors from the band matrix B and not from the tridiagonal matrix T .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical Experiments with Symmetric Eigensolvers

This report describes and analyzes numerical experiments carried out with various symmetric eigensolvers in the context of the material science code Wien 97. Of particular interest are the performance improvements achieved with a new Level 3 eigensolver. The techniques which lead to a signi cant speed up are (1) sophisticated blocking in the tridiagonalization step, which leads to a twosweep al...

متن کامل

USENIX Association Proceedings of the 4 th Annual Linux Showcase

This paper presents a multi-threaded BLAS library for dual SMP Intel computer running Linux. We present simple techniques to obtain parallelism for BLAS call transparently from the client program. We discuss some synchronization methods available under Linux, show performances results for a representative set of BLAS and for a high level linear algebra kernel. We then explain some key points on...

متن کامل

Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers

We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste nearly half the storage space but provide high performance via the use of level 3 BLAS. Standard packed format arrays fully utilize storage...

متن کامل

Performance of Parallel Eigensolvers on

Many models employed to solve problems in quantum mechanics, such as electronic structure calculations, result in nonlinear eigenproblems. The solution to these problems typically involves iterative schemes requiring the solution of a large symmetric linear eigenproblem during each iteration. This paper evaluates the performance of various popular and new parallel symmetric linear eigensolvers ...

متن کامل

Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems

For software to fully exploit the computing power of emerging heterogeneous computers, not only must the required computational kernels be optimized for the specific hardware architectures but also an effective scheduling scheme is needed to utilize the available heterogeneous computational units and to hide the communication between them. As a case study, we develop a static scheduling scheme ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998